Streaming Algorithms from Precision Sampling
نویسندگان
چکیده
A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called Precision Sampling. Using this method, we obtain simple data-stream algorithms that maintain a randomized sketch of an input vector x = (x1, . . . xn), which is useful for the following applications: • Estimating the Fk-moment of x, for k > 2. • Estimating the lp-norm of x, for p ∈ [1, 2], with small update time. • Estimating cascaded norms lp(lq) for all p, q > 0. • l1 sampling, where the goal is to produce an element i with probability (approximately) |xi|/‖x‖1. It extends to similarly defined lp-sampling, for p ∈ [1, 2]. For all these applications the algorithm is essentially the same: scale the vector x entry-wise by a well-chosen random vector, and run a heavy-hitter estimation algorithm on the resulting vector. Our sketch is a linear function of x, thereby allowing general updates to the vector x. Precision Sampling itself addresses the problem of estimating a sum ∑n i=1 ai from weak estimates of each real ai ∈ [0, 1]. More precisely, the estimator first chooses a desired precision ui ∈ (0, 1] for each i ∈ [n], and then it receives an estimate of every ai within additive ui. Its goal is to provide a good approximation to ∑ ai while keeping a tab on the “approximation cost” ∑ i(1/ui). Here we refine previous work [Andoni, Krauthgamer, and Onak, FOCS 2010] which shows that as long as ∑ ai = Ω(1), a good multiplicative approximation can be achieved using total precision of only O(n log n). Work done in part while the author was a postdoctoral researcher at Princeton University/CCI, supported by NSF CCF 0832797. Supported in part by The Israel Science Foundation (grant #452/08), and by a Minerva grant. Supported in part by a Simons Postdoctoral Fellowship and NSF grants 0732334 and 0728645.
منابع مشابه
Hybrid algorithms for Job shop Scheduling Problem with Lot streaming and A Parallel Assembly Stage
In this paper, a Job shop scheduling problem with a parallel assembly stage and Lot Streaming (LS) is considered for the first time in both machining and assembly stages. Lot Streaming technique is a process of splitting jobs into smaller sub-jobs such that successive operations can be overlapped. Hence, to solve job shop scheduling problem with a parallel assembly stage and lot streaming, deci...
متن کاملModelling and Scheduling Lot Streaming Flexible Flow Lines
Although lot streaming scheduling is an active research field, lot streaming flexible flow lines problems have received far less attention than classical flow shops. This paper deals with scheduling jobs in lot streaming flexible flow line problems. The paper mathematically formulates the problem by a mixed integer linear programming model. This model solves small instances to optimality. Moreo...
متن کاملFull version of the paper Streaming Property Testing of Visibly Pushdown Languages*
In the context of formal language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al., a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that a...
متن کاملLot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors
This paper considers a no-wait multi product flowshop scheduling problem with sequence dependent setup times. Lot streaming divide the lots of products into portions called sublots in order to reduce the lead times and work-in-process, and increase the machine utilization rates. The objective is to minimize the makespan. To clarify the system, mathematical model of the problem is presented. Sin...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1011.1263 شماره
صفحات -
تاریخ انتشار 2010